Back

DNA Research

Oxford University Press (OUP)

Preprints posted in the last 30 days, ranked by how well they match DNA Research's content profile, based on 23 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
Near-complete, haplotype-resolved genome assembly of common buckwheat (Fagopyrum esculentum Moench)

Hess, F.; Chen, Y.; Lopez Ortiz, M. E.; Colliquet, A.; Stoffel-Studer, I.; Mac, V.; Grob, S.; Koelliker, R.; Studer, B.

2026-04-01 genomics 10.64898/2026.03.30.715208 medRxiv
Top 0.1%
7.9%
Show abstract

Common buckwheat (Fagopyrum esculentum Moench) is a globally cultivated pseudocereal with a high nutritional quality and economic value. Due to its self-incompatibility, common buckwheat exhibits a high level of heterozygosity, making genome assembly challenging. Consequently, reference-level haplotype-resolved assemblies of common buckwheat are scarce, hindering research and genomics-assisted breeding. Here, we present a near-complete, chromosome-level, haplotype-resolved assembly of a common buckwheat F1 genotype (named Tuka), generated using a trio-binning approach that integrated parental Illumina short-read data with PacBio HiFi and Hi-C data from Tuka. The Tuka assembly comprises two haplomes, Tuka_h1 and Tuka_h2, both showing high contiguity (contig N50 of 76.68 Mb and 84.57 Mb, respectively), high completeness (assembly sizes of 1.28 Gb and 1.23 Gb with BUSCO scores of 96.9% and 96.8%, respectively), high base-level accuracy (QV of 59.08 and 63.03, respectively), and few gaps (35 and 30, respectively). This near-complete assembly of Tuka serves as a valuable genomic resource for common buckwheat, enabling advanced genomic analyses and accelerating research and breeding using state-of-the-art genomic tools.

2
A reference genome assembly for Quercus canariensis Willd

Couturier, F.; Cravero, C.; Lesur, I.; Confais, J.; Belmonte, E.; Piat, L.; Marande, W.; Rellstab, C.; Valbuena, M.; Saez-Laguna, E.; Duvaux, L.

2026-04-01 genetics 10.64898/2026.03.31.714748 medRxiv
Top 0.1%
4.9%
Show abstract

We present a genome assembly from a specimen of Quercus canariensis (Fagaceae; Fagales; Magnoliopsida). The assembly was generated using PacBio HiFi long reads with an approximate sequencing depth of 39X and scaffolded using a reference-guided approach. The genome sequence has a total length of 816.0 megabases for haplotype 1 and 804.8 megabases for haplotype 2. The two haplotypes are each resolved into 12 chromosomal pseudomolecules, with only 3.48% and 1.36% of sequences remaining unplaced in haplotypes 1 and 2, respectively. Assembly completeness is supported by BUSCO scores of 98.3% and 98.2% complete genes for haplotypes 1 and 2, respectively. Structural annotation identified 51,882 and 46,482 protein-coding genes in haplotypes 1 and 2, respectively. This genome assembly provides the first chromosome-scale reference genome for Q. canariensis, laying the base for future genomic and evolutionary studies in this understudied species of the hybridizing white oak species complex. TaxonomyLineage cellular organisms; Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae; Pentapetalae; rosids; fabids; Fagales; Fagaceae; Quercus EBI:txid568684 Quercus canariensis Willd. 1809 (Willdenow)

3
A near chromosome-scale genome assembly of the Common pine sawfly (Diprion pini, Linnaeus, 1758)

Wutke, S.; Michell, C.; Lindstedt, C.

2026-03-21 genomics 10.64898/2026.03.19.712881 medRxiv
Top 0.1%
3.8%
Show abstract

The common pine sawfly, Diprion pini, is a widespread defoliator of pine forests across Europe and Asia, with outbreaks causing substantial ecological and economic damages. However, genomic resources for this species have been limited, hindering advances in molecular ecology or pest management. Here, we present a near chromosome-level reference genome for D.pini, generated using PacBio HiFi reads, Oxford Nanopore MionION long reads, and 10x Genomics linked reads. The final assembly is organized into mostly chromosome-sized scaffolds. It spans a length of 268 Mb, comprises 81 scaffolds, and has a scaffold N50 of 18.7 Mb. BUSCO analysis (hymenoptera_odb10) indicates a high genome completeness of 97.2%. With 22,7 kb the mitochondrial genome is unusually large due to an extended non-coding control region (6,874 bp). Gene prediction identified 26,335 protein-coding genes, of which 12,769 were functionally annotated. Comparative analyses with other sawflies and Apocrita identified 2,472 proteins unique to D. pini, some of which are putatively associated with the processing of plant secondary metabolites. Notably, our genome assembly highlights that, when a closely related, high-quality reference genome is available, chromosome-scale assemblies can be generated without the need of Hi-C sequencing. The genome provides a valuable foundation for the development of improved monitoring and management strategies for D. pini outbreaks and contributes to advancing fundamental research on Hymenoptera evolution.

4
Reference genomes of four miniature and non-miniature cypriniform fishes inhabiting acidic peat-swamp forest blackwaters of Southeast Asia

Sudasinghe, H.; Liu, Z.; Triginer-Llabres, L.; Hui Tan, H.; Britz, R.; Salzburger, W.; Peichel, C.; Rueber, L.

2026-03-24 genomics 10.64898/2026.03.21.713365 medRxiv
Top 0.1%
3.6%
Show abstract

The acidic blackwaters of Southeast Asias peat-swamp forests represent some of the most extreme freshwater environments on Earth. Despite their very low pH values, limited nutrients, and hypoxic conditions, these blackwater habitats harbor a remarkable diversity of freshwater fishes, including multiple lineages that have independently adapted to these extreme conditions and, in some cases, exhibiting extreme body miniaturization. These replicate evolutionary lineages therefore provide a powerful comparative framework to investigate adaptation to extreme environments and the genomic basis of miniaturization. Here, we present high-quality, annotated reference genomes for four cypriniform species endemic to these peat-swamp forest ecosystems: Paedocypris sp., Sundadanio atomus, Boraras brigittae, and Rasbora kalochroma. The first two are progenetic miniatures, including Paedocypris, comprising the smallest known fish, while B. brigittae represents a proportioned dwarf and R. kalochroma a non-miniature taxon. Genome sizes ranged from 401-1,290 Mb and heterozygosity from 0.34-1.7%. All genome assemblies achieved pseudo-chromosome-level contiguity, high k-mer completeness (>99%), and high BUSCO completeness (94.5-98.9%). Repeat analyses revealed lineage-specific differences in transposable element landscapes and abundances, while gene annotation identified notable intron length reduction in progenetic miniatures.

5
Genome-wide characterization of extant clonal diversity in Chilean Carmenere

Garcia, J.; Cochetel, N.; Balic, J.; Barros, S.; Figueroa-Balderas, R.; Castro, A.; Cantu, D.

2026-04-07 genomics 10.64898/2026.04.03.716224 medRxiv
Top 0.2%
2.1%
Show abstract

Carmenere is a widely cultivated and internationally recognized grapevine cultivar in Chile, yet genetic variation among its clones remains poorly characterized. Early studies based on SSR and AFLP markers detected limited polymorphism, but these approaches interrogate only a small fraction of the genome, leaving the extent of clonal diversity unresolved. Here, we generated an improved chromosome-scale diploid genome assembly of Carmenere FPS clone 02 and characterized clonal genomic diversity by sequencing 36 biological replicates representing 12 clones maintained in Chile, including heritage selections rescued from old producer vineyards by Vina Santa Carolina as part of its Bloque Herencia conservation program, and commercial nursery-derived clones. Focusing on low-frequency variants and using replicate-aware consensus calling, we identified more than 9,000 private single nucleotide variants (SNVs) and small indels per clone, providing high-resolution markers for clonal identification. Although most variants were located in repetitive or intergenic regions, a subset affected coding sequences, with genes involved in plant-pathogen interactions, transport, and secondary metabolism most frequently impacted. While variant-affected genes associated with wine anthocyanin content, TA, pH, and alcohol percentage were identified, broader phenotypic characterization will be required to assess their biological significance. Overall, this study provides a genome-wide characterization of extant clonal diversity in Carmenere, with implications for clonal selection and genetic resource conservation.

6
Genome sequence of Tacca chantrieri reveals the genetic basis of floral pigmentation

de Oliveira, J. A. V. S.; Pucker, B.

2026-03-19 plant biology 10.64898/2026.03.17.712415 medRxiv
Top 0.2%
1.9%
Show abstract

Tacca chantrieri, black bat flower, has showy flowers often appearing almost black. Here, we present the genome sequence and corresponding annotation to identify the genetic basis of the pigmentation. Candidate genes associated with the anthocyanin biosynthesis were identified based on this genome sequence and investigated with respect to their properties. The best dihydroflavonol 4-reductase (DFR) candidate, which harbours all amino acid residues believed to be required for DFR activity, shows a threonine in the substrate preference determining position where most characterized DFRs display asparagine or aspartate. This amino acid residue appears to be frequent in the Dioscoreaceae family as a comprehensive investigation revealed.

7
NanoPlasmiQC: Full plasmid sequencing with ONT long-reads and automatic data analysis

de Oliveira, J. A. V. S.; Ng, V.; Wolff, K.; Pucker, B.

2026-04-03 genomics 10.64898/2026.04.01.715842 medRxiv
Top 0.2%
1.8%
Show abstract

Long-read sequencing has shown a rapid technological development during the last years. It has been established as the standard method for the sequencing of plant genomes and has also gained importance for full plasmid sequencing. As Sanger sequencing has a limited read length of about 1 kb, long read sequencing offers a great advantage, as the full plasmid can be sequenced in one read. Here, we present a cost-effective workflow to sequence full plasmids and compare the results against an expectation. The per plasmid cost of this workflow is determined by the number of plasmids investigated simultaneously, but can be lower than the price of a single Sanger sequencing reaction. We developed a workflow for automatic data processing, which allows us to complete sequencing and data analysis within a day.

8
A chromosome-level, haplotype-resolved genome assembly for the barn owl, Tyto alba

Corval, H.; Ducrest, A.-L.; Bachmann Salvy, M.; Burns, A.; Topaloudis, A.; Simon, C.; Cora, E.; Cavaleri, D.; Almasi, B.; Roulin, A.; Iseli, C.; Guex, N.; Cumer, T.; Goudet, J.

2026-03-23 genomics 10.64898/2026.03.20.713190 medRxiv
Top 0.3%
1.3%
Show abstract

Recent advances in long-read sequencing have enabled near telomere-to-telomere (T2T) assemblies across diverse taxa. However, avian genomes remain challenging due to numerous microchromosomes, small, typically < 20Mb, elements that are gene-, GC-, and repeat-rich. As a consequence, microchromosomes are often missing from genome assemblies. Here, we present a chromosome-level, haplotype-resolved genome assembly for the Western barn owl (Tyto alba). Using a trio-binning strategy with Illumina parental reads combined with PacBio HiFi and Oxford Nanopore Technologies data, we generated two phased contig sets. These were scaffolded into 40 linkage groups using a linkage map. Comparative analyses identified unplaced HiFi scaffolds corresponding to microchromosomes, which we integrated into six additional microchromosomes using long reads information. The two assemblies present 46 chromosomes, matching the karyotype of the species. They exhibit strong synteny between parental haplotypes, except for a [~]38 Mb complex region on chromosome 7 containing nested inversions. This high-quality reference provides the first haplotype-resolved and chromosome-level genome for Strigiformes, enabling fine-scale studies of structural variation and avian genome evolution.

9
Integration of QTL Mapping, Transcriptomics, and Genome Resequencing Identifies Yield-Associated Genes for Salt Stress in Rice

Kumar, N.; Singh, B. P.; Mishra, P.; Rani, M.; Gurjar, A.; Mishra, A.; Shah, A.; Gadol, N.; Tiwari, S.; Rathor, S.; Sharma, P. C.; Krishnamurthy, S. L.; Takabe, T.; Mitsuya, S.; Kalia, S.; Singh, N. K.; Rai, V.

2026-04-01 plant biology 10.64898/2026.03.31.715716 medRxiv
Top 0.4%
1.1%
Show abstract

Salinity and sodicity stresses adversely affect rice growth and yield. To overcome yield losses, suitable tolerant rice cultivars can be developed through a marker-assisted breeding (MAB) program. In the present study, genomic regions associated with sodicity stress tolerance at the reproductive stage were identified using a high-density 50kSNP array in a recombinant inbred line (RIL) population derived from the contrasting rice genotypes CSR11 and MI48. A total of 50 QTLs were detected for various yield-related traits; further, 19 QTLs with [&ge;]15% of phenotypic variance were selected for integrated (omics) analysis. RNA sequencing of leaves and panicles at the reproductive stage under sodic stress conditions was employed to find differentially expressed genes. A total of 1368 and 1410 SNPs; 104 and 144 indels were found for MI48 and CSR11, respectively, within the QTL regions from resequencing. At chromosomes 1 and 6, colocalized QTLs (qPH1-1/qGP1-1 and qGP6-2/qSSI6-2) were discovered. Differentially expressed genes (DEGs) were mapped over the QTL regions selected, and SNP variations and indels were screened for colocalized QTLs. Potential candidate genes, namely Os-pGlcT1 (Os01g0133400), OsHKT2;1 (Os06g0701600) and OsHKT2;4 (Os06g0701700), OsANTH12 (Os06g0699800), and OsPTR2 (Os06g0706400), were identified as being responsible for glucose transport, ion homeostasis, pollen germination, and nitrogen use efficiency, respectively, under salt stress. Finally, our study provides important insights into the genes and potential mechanisms affecting grain yield under sodic stress in rice, which will contribute to the development of molecular markers for rice breeding programs.

10
Development of the 4TREE SNP array, a forest multispecies array to enhance European Breeding and conservation programs in pine, poplar and ash.

Guilbaud, R.; Bagnoli, F.; Ben-Sadoun, S.; Biselli, C.; Buret, C.; Buiteveld, J.; Cativelli, L.; Copini, P.; Drouaud, J.; Esselink, D.; Fricano, A.; Benoit, V.; Kelly, L. J.; Kodde, L.; Metheringham, C. L.; Pinosio, S.; Rogier, O.; Segura, V.; Spanu, I.; Tumino, G.; Buggs, R. J.; Gonzalez-Martinez, S. C.; Vietto, L.; Nervo, G.; Jorge, V.; Dowkiw, A.; Smulders, M. J.; Sanchez, L.; Vendramin, G. G.; Bastien, C.; Faivre Rampant, P.

2026-03-23 genomics 10.64898/2026.03.21.711309 medRxiv
Top 0.4%
0.9%
Show abstract

Within the framework of the European Adaptive BREEDING for Better FORESTs project (B4EST, https://b4est.eu/), we have developed genotyping tools for Poplar, Ash, and Pine forest tree species. SNP arrays are attractive genotyping tools because of the user-friendly genotype calling system and the robust transferability among laboratories. Here we describe the development of an Axiom SNP array for Pinus pinaster (13,407 SNPs), Pinus pinea (5,671 SNPs), Poplar spp. (13,408 SNPs), and Fraxinus spp. (13,407 SNPs) based on a two-step process. We first assembled a high-density (>100,000 SNPs/species) screening array that served to test a large panel of candidate SNPs on a diversity panel involving at least 120 individual trees per species or species group. In the second step, we selected and combined the most informative SNPs to build the final 50,000 SNP 4TREE array. This approach resulted in high genotyping success rates, including for species lacking previously validated high-quality SNP resources. The 4TREE SNP array provides a valuable and transferable genomic tool to support genomic prediction, breeding, and adaptive management of forest tree species.

11
Transposons Triggered Dynamic Evolution of MKK3 Gene, a Key Regulator for Seed Dormancy in Barley

Tressel, L. G.; Caspersen, A. M.; Walling, J. G.; Gao, D.

2026-03-25 plant biology 10.64898/2026.03.23.713676 medRxiv
Top 0.4%
0.9%
Show abstract

Barley (Hordeum vulgare L.) is an important crop in the world and its seed dormancy is primarily controlled by a Mitogen-Activated Protein Kinase Kinase 3 (MKK3) gene. Although kinase activity of MKK3 and its roles in barley post-domestication have been widely studied, the pre-domestication evolution of MKK3 and the spread of nondormant alleles among global barley varieties remain largely unexplored. In this study, we analyzed MKK3 sequences in barley and its wild progenitor (H. spontaneum) and identified two polymorphic miniature inverted-repeat transposable elements (MITEs). Comparative analyses indicated that the insertions/excision of the MITEs predated the current estimates of barley domestication. Examination of the barley pangenomes coupled with droplet digital (dd) PCR revealed extensive copy number variation of MKK3 and suggested that transposons likely drove tandem amplification of the MKK3 gene on chromosome 5H. Additionally, approximately 1-Kb MKK3 sequences were found on chromosomes 1H and 6H. Further analysis indicated that these short MKK3 sequences were captured by a CACTA transposon that also contained fragments from four other expressed genes. The acquisition of MKK3 was estimated to be between 1.9-2.5 million years ago. Together, these findings illuminate the dynamic pre-domestication evolution of the MKK3 gene and suggest three independent origins of highly nondormant barley worldwide including a unique lineage predominant in Ethiopian germplasm. This study reveals the pivotal roles of transposons in MKK3 evolution and provide helpful information for understanding the complex history of MKK3 gene in barley and also for improving preharvest sprouting (PSH) tolerant varieties under distinct natural conditions.

12
A tool to shoot genes with massive air from a compressor (TSGMAC)

Tsugama, D.

2026-03-26 plant biology 10.64898/2026.03.24.713841 medRxiv
Top 0.4%
0.8%
Show abstract

Particle bombardment systems are widely used for plant transformation, but commercial devices are expensive and rely on high-pressure helium gas. This study aimed to develop a cost-effective and helium gas-free alternative using an air duster gun connected to a commercial compressor. A nozzle (for DNA with transgenes), gold particles (as DNA carriers), nozzle-to-sample distance, and a method for coating gold particles with DNA were optimized to yield better transformation efficiency in targeting onion epidermal cells and rice calli. From the rice calli transformed with the newly developed system (a tool to shoot genes with massive air from a compressor: TSGMAC), stable transgenic plants could be obtained. TSGMAC offers a low-cost and helium gas-free solution for plant transformation and genome editing and can enhance accessibility to particle bombardment-based techniques.

13
Whole-genome pre-amplification as a viable approach for genomic screening of FFPE-derived DNA samples

Guerrero Quiles, C.; Lodhi, T.; Sellers, R.; Sahoo, S.; Weightman, J.; Breitwieser, W.; Sanchez Martinez, D.; Bartak, M.; Shamim, A.; Lyons, S.; Reeves, K.; Reed, R.; Hoskin, P.; West, C.; Forker, L.; Smith, T.; Bristow, R.; Wedge, D. C.; Choudhury, A.; Biolatti, L. V.

2026-03-29 molecular biology 10.64898/2026.03.26.714414 medRxiv
Top 0.6%
0.7%
Show abstract

Whole-genome sequencing (WGS) enables comprehensive analysis of tumour genomes, but its use in formalin-fixed paraffin-embedded (FFPE) samples is limited by DNA fragmentation and low yields. Whole-genome amplification (WGA) methods such as multiple displacement amplification (MDA) can boost DNA availability but distort copy-number alteration (CNA) profiles. DNA ligation-mediated MDA (DLMDA) mitigates this bias by reconstituting fragmented templates, yet its performance in FFPE-derived DNA remains uncertain. We compared paired DLMDA pre-amplified (2h, 8h) and non-pre-amplified FFPE prostate tumour samples from 22 archival blocks (5, 15 and 20 years old). DLMDA increased DNA yield by 42- to 86-fold, with global CNA patterns largely preserved. However, DLMDA significantly reduced the number of detected CNA deletions and amplifications. These effects were independent of both block age and reaction time. CNA dropouts were randomly distributed across the genome, indicating that DLMDA does not introduce regional bias. Our results show that DLMDA enables robust DNA yield recovery and avoids false-positive CNA artefacts, but at the cost of reduced CNA sensitivity. While suitable for CNA screening pipelines through WGS, further improvements are required to minimise the false-negative risk and improve the techniques sensitivity for FFPE-based genomics.

14
The genome of the Delisea pulchra: a resource for the study of chemical host-microbe interactions in red algae

Dittami, S. M.; Hudson, J.; Brillet-Gueguen, L.; Ficko-Blean, E.; Tanguy, G.; Rousvoal, S.; Legeay, E.; Markov, G. V.; Delage, L.; Godfroy, O.; Corre, E.; Collen, J.; Leblanc, C.; Egan, S.

2026-04-02 genomics 10.64898/2026.03.31.715562 medRxiv
Top 0.6%
0.6%
Show abstract

BackgroundRed macroalgae (Rhodophyta) are ecologically and economically important marine primary producers, yet genomic resources for most species remain scarce. Delisea pulchra, a temperate red alga known for its halogenated furanone-based chemical defenses, serves as a model for studying algal-microbe interactions, antifouling mechanisms, and disease dynamics. ResultsHere we present a high-quality genome assembly of this species. The nuclear genome comprises 134 Mbp across 271 contigs with an N50 of 1.47 Mbp and encodes 13,387 predicted protein-coding genes. Comparative genomics with other red algae revealed expansions in gene families involved in DNA methylation, and oxidative stress responses, including glutathione S-transferases and superoxide dismutases. Analysis of glycosyltransferases, sulfatases, and sulfurylases implicated in galactan biosynthesis suggests D. pulchra possesses a complex and potentially novel extracellular matrix. We also identified several vanadium haloperoxidases (vHPOs), heme-dependent haloperoxidases (hHPOs), and two type III polyketide synthase (PKS) genes unique to the D. pulchra, which together represent promising candidate genes for bromofuranone production. ConclusionThe D. pulchra genome provides a foundation for molecular investigations into defense, signaling, and host-microbe interactions. It has been deposited at the European Nucleotide Archive under accession number PRJEB101077. All datasets, annotations, and interactive tools for exploring the genome are also available through the Rhodoexplorer portal at https://rhodoexplorer.sb-roscoff.fr.

15
Multi-trait Multi-environment Genomic Prediction Strategies for Miscanthus sacchariflorus Populations

Proma, S.; Garcia-Abadillo, J.; Sagae, V. S.; Sacks, E.; Leakey, A. D. B.; Zhao, H.; Ghimire, B. K.; Lipka, A. E.; Njuguna, J. N.; Yu, C. Y.; Seong, E. S.; Yoo, J. H.; Nagano, H.; Anzoua, K. G.; Yamada, T.; Chebukin, P.; Jin, X.; Clark, L. V.; Petersen, K. K.; Peng, J.; Sabitov, A.; Dzyubenko, E.; Dzyubenko, N.; Glowacka, K.; Nascimento, M.; Campana Nascimento, A. C.; Dwiyanti, M. S.; Bagment, L.; Shaik, A.; Jarquin, D.

2026-03-23 genomics 10.64898/2026.03.18.712730 medRxiv
Top 0.7%
0.5%
Show abstract

Genomic selection holds the potential to serve as a strategic tool to enhance the genetic gain of complex traits in Miscanthus breeding programs. The development of improved cultivars requires their assessment for various traits across diverse environments to ensure suitable overall performance. Hence, the multi-trait multi-environment (MTME) genomic prediction (GP) models offer an opportunity to improve selection accuracy. This study aims to evaluate the potential of five GP models: (1) three MTME models including genotype-by-trait-by-environment interaction (GxExT) and (2) two single-trait multi-environment (STME) models (with and without GxE interaction). A Miscanthus sacchariflorus population comprising 336 genotypes evaluated in three environments and scored for four traits (biomass yield YDY, total culm number TCM, average internode length AIL, and culm node number CNN) was analyzed. The predictive ability of the models was evaluated considering three cross-validation schemes resembling realistic scenarios (CV1: predicting new genotypes, CVP: predicting missing traits in a given environment, and CV2: predicting partially observed genotypes). On average, in all cross-validation schemes compared to the STME the predictive ability of the MTME models was 10% to 70% higher for TCM and AIL. On the other hand, for YDY and CNN, both STME models performed similarly or slightly better (between 5 to 64%) than the MTME models in most environments. While the MTME models were not successful for all traits when compared to their STME counterparts, MTME models improved the prediction of the performance of genotypes that were untested across environments or lacked trait information in a specific environment. Overall, our study suggests that MTME GP models can be implemented in Miscanthus breeding programs to improve the predictive ability of the complex traits, shorten breeding cycles, and accelerate selection decisions.

16
Haplotype-resolved Genome Assemblies of Hybrid Wheatgrass and Bluebunch Wheatgrass Reveal the Stepwise Polyploid Origin and Biased Subgenome Dominance

Ji, Y.; Chaudhary, R.; Khan, N.; Perumal, S.; Wang, Z.; Moghanloo, L.; Hucl, P.; Biligetu, B.; Sharpe, A. G.; Jin, L.

2026-03-27 genomics 10.64898/2026.03.27.714782 medRxiv
Top 0.7%
0.5%
Show abstract

Concerns over climate change have intensified the demand for stress resistant crops like hybrid wheatgrass (HWG; Elymus hoffmannii, StStStStHH), a perennial forage species known for its exceptional salt and drought tolerance. However, hexaploidy and high heterozygosity have complicated efforts to resolve its genomic structure and evolutionary history. Here, we present high-quality, haplotype-resolved, chromosome-level genome assemblies for HWG (CDC Saltking) and its putative progenitor, bluebunch wheatgrass (Pseudoroegneria spicata). By integrating PacBio HiFi and ultra-long Oxford Nanopore sequencing with Hi-C scaffolding, we assembled the 10.7 Gb HWG genome into 21 pseudochromosomes per haplotype. Our phylogenomic analysis redefines the origin of the H subgenome, positioning it as an intermediate between Old-World Hordeum marinum (sea barley) and Hordeum brevisubulatum. Notably, we identified significant chromosomal rearrangements, including a unique duplication on St chromosome 4. Transcriptome analysis across multiple tissues revealed a pronounced expression dominance of the H subgenome. This dominance was not associated with reduced LTR density, suggesting that selective pressures for rapid adaptation of the latest subgenome entrant may drive its dominance. Finally, using the f-branch statistic, population genomic analysis of 189 accessions representing eight Elymus and Pseudoroegneria species revealed extensive reticulate evolutionary relationships and identified P. spicata as a major, asymmetric genetic donor within the wheatgrass complex. These resources provide a foundational framework for future genomic research and genetic improvement in grasses and for the introgression of stress-tolerance traits into cereal crops such as wheat. Key MessagesDevelopment of world-first high-quality chromosomal-level haplotype-resolved genome assemblies of hexaploid HWG and diploid progenitor, Pseudoroegneria spicata, enabled the identification of the subgenome origins. This study resolved the evolutionary placement of the St genome and clarified the history of polyploidization and hybridization in HWG. Homeolog expression bias in the H subgenome likely reflects selective pressure favoring greater gene retention and upregulation of functionally important genes, thereby enhancing hybrid fitness. Population structure analysis distinctly differentiates P. spicata, E. repens, E. hoffmannii from other European Pseudoroegneria species. The findings reveal the complex patterns of interspecific gene flow and population dynamics within the Elymus and Pseudoroegneria species.

17
Next-Generation Soybean Haplotype Map as A Genomic Resource for Enhanced Trait Discovery and Functional Analysis

Khan, A. W.; Doddamani, D.; Song, Q.; Vuong, T. D.; Chhapekar, S. S.; Ye, H.; Garg, V.; Varshney, R. K.; Nguyen, H. T.

2026-03-26 genomics 10.64898/2026.03.24.713798 medRxiv
Top 0.7%
0.5%
Show abstract

We present a global soybean haplotype map generated from whole-genome sequencing of 1,278 Glycine max and Glycine soja accessions, comprising 11.37 million SNPs and 2.05 million short insertions and deletions. This map (GmHapMap-II) captures unprecedented worldwide genetic diversity, reflecting the broad extent of the global soybean gene pool. Population structure analyses revealed six geographically distinct subpopulations that affected the linkage and shaped the recombination. The haplotype variation map was used to identify novel genomic regions associated with crude protein content on chromosome 15 that were not detected by a lower SNP density array. LD-based haplotype analysis revealed a superior haplotype for crude protein content. The constructed haplotype map enabled detailed characterization of haplotype diversity and copy number polymorphism at the SCN-associated rhg-1 and Rhg-4 loci, revealing both novel haplotype structures and germplasm lines with elevated CNV relative to previously characterized genotypes. We employed the HapMap matrix for a multi-class variations ML-based genomic prediction approach to predict phenotypes for SCN and catalogued the gene-centric haplotypes in a user-friendly database. The analysis revealed the extent of deleterious alleles present in the soybean germplasm and how breeders have deployed beneficial alleles and purged deleterious ones. The haplotype map will serve as a major genomic resource for trait-based mapping, enhancing efforts in the genomics-enabled development of improved cultivars.

18
Easy-to-use whole-genome sequencing workflows and standardized practices to uncover hidden genetic variation in Synechocystis PCC 6803 wild-type and knock-out strains

Theune, M.; Fritsche, R.; Kueppers, N.; Boehm, M.; Kolkhof, P.; Paul, F.; Popa, O.; Oldenburg, E.; Wiegard, A.; Axmann, I. M.; Gutekunst, K.

2026-04-08 microbiology 10.64898/2026.04.08.717167 medRxiv
Top 0.9%
0.3%
Show abstract

Knock-out mutants are often used to study gene function by disrupting a specific gene and comparing the mutant to a wild-type strain. Reliable interpretation, however, requires that the two strains differ only by the intended mutation and that the observed phenotype is caused specifically by the deleted gene. In the highly polyploid cyanobacterium Synechocystis sp. PCC 6803, this is particularly challenging because incomplete segregation can mask genetic heterogeneity or secondary suppressor mutations. The genetic variation among laboratory wild-type lines can further confound phenotypic analyses. We show that these challenges can be addressed by routine strain validation via whole-genome sequencing (WGS). To this end, we developed and tested user friendly workflows for short-read (Illumina), long-read (Oxford Nanopore Technologies; ONT), and hybrid data, providing standardized quality control, variant calling, and structural variant detection. We benchmarked their performance in detecting single-nucleotide polymorphisms (SNPs), small indels, and structural variants using simulated datasets across different coverages and mixed populations. Applying the workflows to three Synechocystis sp. PCC 6803 wild-type lines revealed multiple sequence and structural differences relative to the reference genome, including previously undescribed genetic variants, underscoring the importance of documenting the strain background and the value of long-read sequencing. Characterization of two independent 6-phosphogluconate dehydrogenase (gnd) knock-out mutants and their complemented strains highlighted how a failed rescue can reveal a phenotype unrelated to the intended knock-out. An automated literature analysis revealed that only a minority of the investigated Synechocystis studies that used knock-out mutants included complementation as a control (39%), whereas this practice is more common in studies involving Escherichia coli (63%) and Saccharomyces cerevisiae (55%). Based on these results, we propose a practical guide for standardizing knock-out phenotyping in Synechocystis PCC 6803. Combined with accessible workflows for routine whole-genome validation, this framework aims to support more robust and reproducible knock-out studies in the future.

19
Paralytic Shellfish Toxin production in Alexandrium minutum (Dinophyceae): insights from omics integration using toxigenic and non-toxigenic recombinant progeny

Mary, L.; Quere, J.; Latimier, M.; Artigaud, S.; Hegaret, H.; Le Gac, M.; Reveillon, D.

2026-03-26 genomics 10.64898/2026.03.24.713948 medRxiv
Top 0.9%
0.3%
Show abstract

Paralytic Shellfish Toxins (PSTs) are produced by certain species of cyanobacteria and dinoflagellates. Part of the PST biosynthetic pathway has been elucidated in cyanobacteria, and the implication of some sxt genes has been confirmed by experimental studies. Contrary to cyanobacteria, knowledge about PST biosynthesis in dinoflagellates is more limited and generally restricted to comparative studies with the cyanobacterial pathway. To investigate the specificity of the PST pathway in dinoflagellates, 16 toxic and non-toxic A. minutum strains from a recombinant cross were compared, without prior assumption on genes or metabolites involved in PST synthesis, using an integrative approach combining untargeted metabolomic and transcriptomic data. Among the 60 most distinguishing transcripts between toxic and non-toxic strains, only 3 sxt genes were present, sxtA4, sxtG, and sxtI. In contrast, non-sxt homologs were detected as highly discriminant between these two phenotypes. More specifically, a phyH homolog may act as the analog of sxtS found in cyanobacteria. Moreover, we identified four putative synthetic PST intermediates. Among these, Int-C2, correlated with the toxic phenotype, whereas 3 others were detected in both toxic and non-toxic strains, suggesting that these strains may share some parts of the biosynthetic pathway. Finally, our results showed that PST biosynthesis in dinoflagellate results from the activity of sxt genes, acquired by horizontal gene transfer from cyanobacteria, as well as from other genes not acquired from cyanobacteria, such as phyH.

20
Transposable elements as new players to decipher sex differences in Parkinson Disease

Gordillo-Gonzalez, F.; Galiana-Rosello, C.; Grillo-Risco, R.; Soler-Saez, I.; Hidalgo, M. R.; Siomi, H.; Kobayashi-Ishihara, M.; Garcia-Garcia, F.

2026-03-30 bioinformatics 10.64898/2026.03.27.714370 medRxiv
Top 1%
0.3%
Show abstract

We present a novel integrative analysis of transposable elements (TEs) in 4 single cell RNA-seq (scRNA-seq) datasets of postmortem substantia nigra pars compacta samples of Parkinson Disease (PD) patients matched healthy controls, with the objective of building a cell-type specific trustworthy atlas of TEs that may clarify the role of TEs in sex differences in PD. We have used the soloTE tool to evaluate the TEs expression changes across all snRNA-seq studies identified in our previous systematic review, and then integrated the results using meta-analysis techniques. Finally, we evaluated the possible associations between TEs and protein coding genes by integrating our previous results in this matter with the information of TEs obtained, in order to propose the possible action mechanism by which some of the TEs contribute to PD.